Learning to Parse from a Treebank: Combining TBL and ILP
نویسنده
چکیده
Parsing a natural languagewith its substantial structural complexity and ambiguity has turned out to be a puzzler. While the most of attempts in this area so far has relied on hand-generated parsers, difficulties inherent in the manual construction of natural language grammar lead up to efforts to induce the grammar automatically. Our approach to the automatic grammar induction presented in this paper has resulted in design and implementation of the system GRIND (Grammar Induction), which is capable to learn a sequence of context-dependent parse actions from a given corpus of labelled derivation trees. To this end, GRIND combines two established methods of machine learning: transformation-based learning (TBL) and inductive logic programming (ILP). Being trained and tested on corpus SUSANNE, GRIND reached the accuracy of 96% and the recall of 68%.
منابع مشابه
Automated Parser Construction from a Treebank by means of TBL and ILP
Considering the difficulties inherent in the manual construction of natural language parsers, we have designed and implemented our system GRIND which is capable of learning a sequence of context-dependent parsing actions from an arbitrary corpus containing labelled parse trees. Being trained and tested on corpus SUSANNE, GRIND reaches the accuracy of 96 % and the recall of 68 %.
متن کاملInducing Deterministic Prolog Parsers from Treebanks: A Machine Learning Approach
or untagged treebanks. ’ When trained on an untagged This paper presents a method for constructing deterministic Prolog parsers from corpora of parsed sentences. Our approach uses recent machine learning methods for inducing Prolog rules from examples (inductive logic programming). We discuss several advantages of this method compared to recent statistical methods and present results on learnin...
متن کاملCombining LAPIS and WordNet for Learning of LR Parsers with Optimal Semantic Constraints
There is a history of research focussed on learning of shift-reduce parsers from syntactically annotated corpora by the means of machine learning techniques based on logic. The presence of lexical semantic tags in the treebank has proved useful for the learning of semantic constraints limiting the amount of nondeterminism in the parsers. The grain of the semantic tags used is of direct importan...
متن کاملUnsupervised Parse Selection for HPSG
Parser disambiguation with precision grammars generally takes place via statistical ranking of the parse yield of the grammar using a supervised parse selection model. In the standard process, the parse selection model is trained over a hand-disambiguated treebank, meaning that without a significant investment of effort to produce the treebank, parse selection is not possible. Furthermore, as t...
متن کاملTreeblazing: Using External Treebanks to Filter Parse Forests for Parse Selection and Treebanking
We describe “treeblazing”, a method of using annotations from the GENIA treebank to constrain a parse forest from an HPSG parser. Combining this with self-training, we show significant dependency score improvements in a task of adaptation to the biomedical domain, reducing error rate by 9% compared to out-of-domain gold data and 6% compared to self-training. We also demonstrate improvements in ...
متن کامل